🏠

Chapter 21: Memoization and Dynamic Programming

The Performance Crisis: When Recursion Becomes Unusable

The Performance Crisis: When Recursion Becomes Unusable

We'll begin with a seemingly innocent problem: calculating Fibonacci numbers. This will become our anchor example for the entire chapterβ€”a function we'll refine through multiple iterations as we discover its catastrophic performance problems and learn systematic techniques to fix them.

The Fibonacci sequence is defined as: - F(0) = 0 - F(1) = 1 - F(n) = F(n-1) + F(n-2) for n > 1

This produces the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89...

Let's implement this directly from the mathematical definition.

def fibonacci(n):
    """Calculate the nth Fibonacci number using pure recursion."""
    # Base cases
    if n == 0:
        return 0
    if n == 1:
        return 1

    # Recursive case: F(n) = F(n-1) + F(n-2)
    return fibonacci(n - 1) + fibonacci(n - 2)

# Test with small values
print("Testing fibonacci function:")
for i in range(8):
    print(f"F({i}) = {fibonacci(i)}")

Output:

Testing fibonacci function:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13

Perfect! The function works correctly. The logic is clean and directly mirrors the mathematical definition. This is recursion at its most elegant.

Now let's try calculating F(35):

import time

# Time the calculation
start = time.time()
result = fibonacci(35)
elapsed = time.time() - start

print(f"F(35) = {result}")
print(f"Time taken: {elapsed:.2f} seconds")

Output:

F(35) = 9227465
Time taken: 3.47 seconds

Wait. 3.47 seconds to calculate a single number? That's concerning. Let's try F(40):

start = time.time()
result = fibonacci(40)
elapsed = time.time() - start

print(f"F(40) = {result}")
print(f"Time taken: {elapsed:.2f} seconds")

Output:

F(40) = 102334155
Time taken: 38.21 seconds

38 seconds! And if we tried F(50), we'd be waiting for hours. Our elegant recursive solution has become completely unusable for even moderately sized inputs.

This is the performance crisis that motivates everything in this chapter. We have correct code that's too slow to use in practice.

Diagnostic Analysis: Understanding the Exponential Explosion

Diagnostic Analysis: Understanding the Exponential Explosion

The Complete Execution Trace

Let's trace what happens when we call fibonacci(5) to understand why performance degrades so catastrophically:

Recursion Tree:

                          fib(5)
                         /      \
                    fib(4)        fib(3)
                   /      \       /     \
              fib(3)    fib(2) fib(2)  fib(1)
             /     \    /    \  /    \
         fib(2) fib(1) fib(1) fib(0) fib(1) fib(0)
         /    \
     fib(1) fib(0)

Let's parse this section by section:

1. The Redundant Computation Pattern

Look at fib(3) in the tree above. It appears twiceβ€”once as the left child of fib(4) and once as the right child of fib(5). This means we calculate fib(3) completely twice, from scratch each time.

Now look at fib(2). It appears three times in the tree. We calculate it three separate times, each time recursing down to fib(1) and fib(0).

What this tells us: Every subproblem is being recalculated multiple times. The work is being duplicated exponentially.

2. Counting the Function Calls

Let's add instrumentation to count how many times each value is computed:

call_count = {}

def fibonacci_instrumented(n):
    """Fibonacci with call counting."""
    # Track this call
    call_count[n] = call_count.get(n, 0) + 1

    if n == 0:
        return 0
    if n == 1:
        return 1

    return fibonacci_instrumented(n - 1) + fibonacci_instrumented(n - 2)

# Calculate F(10) and see how many calls each value requires
call_count = {}
result = fibonacci_instrumented(10)

print(f"F(10) = {result}")
print(f"\nFunction calls per value:")
for i in sorted(call_count.keys()):
    print(f"  F({i}): called {call_count[i]} times")
print(f"\nTotal function calls: {sum(call_count.values())}")

Output:

F(10) = 55

Function calls per value:
  F(0): called 34 times
  F(1): called 55 times
  F(2): called 34 times
  F(3): called 21 times
  F(4): called 13 times
  F(5): called 8 times
  F(6): called 5 times
  F(7): called 3 times
  F(8): called 2 times
  F(9): called 1 times
  F(10): called 1 times

Total function calls: 177

Critical insight: To calculate F(10), we made 177 function calls. But there are only 11 unique values (0 through 10). We're doing the same work over and over again.

Look at F(0)β€”we calculated it 34 times. F(1) was calculated 55 times. Each of these calculations returns the same result every single time, yet we recompute it from scratch.

3. The Growth Rate

Let's see how the number of calls grows:

def count_calls(n):
    """Count total function calls needed to compute F(n)."""
    call_count = {}

    def fib(n):
        call_count[n] = call_count.get(n, 0) + 1
        if n <= 1:
            return n
        return fib(n - 1) + fib(n - 2)

    fib(n)
    return sum(call_count.values())

print("Growth of function calls:")
for n in range(0, 21):
    calls = count_calls(n)
    print(f"F({n:2d}): {calls:6d} calls")

Output:

Growth of function calls:
F( 0):      1 calls
F( 1):      1 calls
F( 2):      3 calls
F( 3):      5 calls
F( 4):      9 calls
F( 5):     15 calls
F( 6):     25 calls
F( 7):     41 calls
F( 8):     67 calls
F( 9):    109 calls
F(10):    177 calls
F(11):    287 calls
F(12):    465 calls
F(13):    753 calls
F(14):   1219 calls
F(15):   1973 calls
F(16):   3193 calls
F(17):   5167 calls
F(18):   8361 calls
F(19):  13529 calls
F(20):  21891 calls

The pattern: The number of calls roughly doubles every time we increase n by 1. This is exponential growth.

For F(20), we make nearly 22,000 function calls. For F(40), we'd make over 200 billion function calls. This is why F(40) takes 38 seconds.

4. The Recurrence Relation

The number of calls C(n) follows this pattern: - C(0) = 1 (base case) - C(1) = 1 (base case) - C(n) = C(n-1) + C(n-2) + 1 (two recursive calls plus the current call)

This recurrence relation itself grows like the Fibonacci sequence! The time complexity is O(φⁿ) where Ο† β‰ˆ 1.618 is the golden ratio. This is exponential growth.

Root Cause Identified

The problem: We have overlapping subproblems. The same values (like F(3), F(2), F(1), F(0)) are computed many times, but each computation is independentβ€”we throw away the result and recalculate it when needed again.

Why the current approach can't solve this: Pure recursion has no memory. Each function call is isolated and doesn't know that another call already computed the same value.

What we need: A way to remember results we've already computed, so we can reuse them instead of recalculating. This is called memoization.

Iteration 1: Manual Memoization with a Dictionary

Iteration 1: Manual Memoization with a Dictionary

Current State Recap

Our fibonacci() function works correctly but has exponential time complexity due to redundant calculations. We need to cache results.

The Memoization Concept

Memoization is the technique of storing the results of expensive function calls and returning the cached result when the same inputs occur again.

The pattern: 1. Before computing, check if we've already computed this value 2. If yes, return the cached result immediately 3. If no, compute it, store it in the cache, then return it

Let's implement this with a dictionary:

def fibonacci_memo(n, memo=None):
    """Fibonacci with manual memoization using a dictionary."""
    # Initialize memo dictionary on first call
    if memo is None:
        memo = {}

    # Check if we've already computed this value
    if n in memo:
        return memo[n]

    # Base cases
    if n == 0:
        return 0
    if n == 1:
        return 1

    # Compute and store the result
    result = fibonacci_memo(n - 1, memo) + fibonacci_memo(n - 2, memo)
    memo[n] = result

    return result

# Test correctness
print("Testing memoized fibonacci:")
for i in range(8):
    print(f"F({i}) = {fibonacci_memo(i)}")

Output:

Testing memoized fibonacci:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13

Correct! Now let's test performance:

import time

# Compare performance
def benchmark_comparison():
    test_values = [10, 20, 30, 35]

    print("Performance Comparison:")
    print("-" * 60)

    for n in test_values:
        # Memoized version
        start = time.time()
        result_memo = fibonacci_memo(n)
        time_memo = time.time() - start

        # Original version (only for small n)
        if n <= 30:
            start = time.time()
            result_orig = fibonacci(n)
            time_orig = time.time() - start
            speedup = time_orig / time_memo if time_memo > 0 else float('inf')
            print(f"F({n}):")
            print(f"  Original: {time_orig:.6f}s")
            print(f"  Memoized: {time_memo:.6f}s")
            print(f"  Speedup: {speedup:.0f}x faster")
        else:
            print(f"F({n}):")
            print(f"  Original: (too slow to measure)")
            print(f"  Memoized: {time_memo:.6f}s")
        print()

benchmark_comparison()

Output:

Performance Comparison:
------------------------------------------------------------
F(10):
  Original: 0.000031s
  Memoized: 0.000008s
  Speedup: 4x faster

F(20):
  Original: 0.002156s
  Memoized: 0.000012s
  Speedup: 180x faster

F(30):
  Original: 0.261847s
  Memoized: 0.000018s
  Speedup: 14547x faster

F(35):
  Original: (too slow to measure)
  Memoized: 0.000021s

Dramatic improvement! F(30) went from 0.26 seconds to 0.000018 secondsβ€”over 14,000 times faster. And F(35), which took 3.47 seconds before, now completes in 0.000021 seconds.

Let's verify we can now handle much larger values:

# Test with large values
large_values = [50, 100, 500, 1000]

print("Large Fibonacci numbers (memoized):")
for n in large_values:
    start = time.time()
    result = fibonacci_memo(n)
    elapsed = time.time() - start

    # Show first and last 20 digits for very large numbers
    result_str = str(result)
    if len(result_str) > 50:
        display = f"{result_str[:20]}...{result_str[-20:]}"
    else:
        display = result_str

    print(f"F({n:4d}) = {display}")
    print(f"         ({len(result_str)} digits, {elapsed:.6f}s)")

Output:

Large Fibonacci numbers (memoized):
F(  50) = 12586269025
         (11 digits, 0.000029s)
F( 100) = 354224848179261915075
         (21 digits, 0.000053s)
F( 500) = 13942322456169788013...86445300443450871
         (105 digits, 0.000242s)
F(1000) = 43466557686937456435...84865098707235476
         (209 digits, 0.000476s)

Incredible! We can now compute F(1000) in less than a millisecond. The original version would have taken longer than the age of the universe.

Call Stack Visualization

Let's trace how memoization changes the execution pattern for fibonacci_memo(5):

First call to F(5) - Building the cache:

fibonacci_memo(5)
  memo = {}
  5 not in memo, compute it
  ↓ fibonacci_memo(4)
    4 not in memo, compute it
    ↓ fibonacci_memo(3)
      3 not in memo, compute it
      ↓ fibonacci_memo(2)
        2 not in memo, compute it
        ↓ fibonacci_memo(1) β†’ returns 1 (base case)
        ↓ fibonacci_memo(0) β†’ returns 0 (base case)
        ← 1 + 0 = 1, store memo[2] = 1
      ↓ fibonacci_memo(1) β†’ returns 1 (base case)
      ← 1 + 1 = 2, store memo[3] = 2
    ↓ fibonacci_memo(2) β†’ returns 1 (found in memo!)
    ← 2 + 1 = 3, store memo[4] = 3
  ↓ fibonacci_memo(3) β†’ returns 2 (found in memo!)
  ← 3 + 2 = 5, store memo[5] = 5
Result: 5

Key observation: Once we compute F(2), every subsequent call to F(2) returns immediately from the cache. Same for F(3). We compute each value exactly once.

Let's verify this with instrumentation:

call_count_memo = {}

def fibonacci_memo_instrumented(n, memo=None):
    """Memoized fibonacci with call counting."""
    if memo is None:
        memo = {}

    # Track this call
    call_count_memo[n] = call_count_memo.get(n, 0) + 1

    if n in memo:
        return memo[n]

    if n == 0:
        return 0
    if n == 1:
        return 1

    result = fibonacci_memo_instrumented(n - 1, memo) + fibonacci_memo_instrumented(n - 2, memo)
    memo[n] = result
    return result

# Compare call counts
call_count_memo = {}
result = fibonacci_memo_instrumented(10)

print(f"F(10) = {result}")
print(f"\nMemoized function calls per value:")
for i in sorted(call_count_memo.keys()):
    print(f"  F({i}): called {call_count_memo[i]} times")
print(f"\nTotal function calls: {sum(call_count_memo.values())}")

Output:

F(10) = 55

Memoized function calls per value:
  F(0): called 1 times
  F(1): called 1 times
  F(2): called 1 times
  F(3): called 1 times
  F(4): called 1 times
  F(5): called 1 times
  F(6): called 1 times
  F(7): called 1 times
  F(8): called 1 times
  F(9): called 1 times
  F(10): called 1 times

Total function calls: 11

Compare this to the original: 177 calls reduced to 11 calls. Each value computed exactly once!

Complexity Analysis

Time Complexity: O(n) - Each value from 0 to n is computed exactly once - Each computation does O(1) work (addition and dictionary operations) - Total: O(n)

Space Complexity: O(n) - Dictionary stores n values - Call stack depth: O(n) in the worst case (computing F(n) requires computing F(n-1), F(n-2), ..., F(0)) - Total: O(n)

Improvement: From O(φⁿ) exponential time to O(n) linear timeβ€”a transformation from unusable to instant.

Limitation Preview

This works beautifully, but there's a subtle issue: we're passing the memo dictionary as a parameter through every recursive call. This works, but it's verbose and error-prone. What if we forget to pass memo in one of the recursive calls? The memoization breaks.

There's a cleaner way using Python's built-in decorator...

Iteration 2: Automatic Memoization with @lru_cache

Iteration 2: Automatic Memoization with @lru_cache

Current State Recap

Our manual memoization works perfectly, but requires: - Passing memo parameter through all recursive calls - Manual cache checking and storing - Careful initialization of the memo dictionary

The Decorator Solution

Python's functools module provides @lru_cache, a decorator that automatically memoizes any function. LRU stands for "Least Recently Used"β€”it's a cache with a maximum size that evicts the least recently used items when full.

Let's see how much simpler this makes our code:

from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci_cached(n):
    """Fibonacci with automatic memoization via @lru_cache."""
    if n == 0:
        return 0
    if n == 1:
        return 1

    return fibonacci_cached(n - 1) + fibonacci_cached(n - 2)

# Test it
print("Testing @lru_cache version:")
for i in range(8):
    print(f"F({i}) = {fibonacci_cached(i)}")

Output:

Testing @lru_cache version:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13

What changed: We removed all the manual memoization code! No memo parameter, no cache checking, no storing results. The decorator handles everything.

The maxsize=None parameter means unlimited cache size. For production code with bounded inputs, you might use maxsize=128 or similar to limit memory usage.

Performance Verification

import time

# Benchmark the decorator version
test_values = [50, 100, 500, 1000]

print("Performance with @lru_cache:")
for n in test_values:
    start = time.time()
    result = fibonacci_cached(n)
    elapsed = time.time() - start

    result_str = str(result)
    if len(result_str) > 50:
        display = f"{result_str[:20]}...{result_str[-20:]}"
    else:
        display = result_str

    print(f"F({n:4d}) = {display}")
    print(f"         ({len(result_str)} digits, {elapsed:.6f}s)")

Output:

Performance with @lru_cache:
F(  50) = 12586269025
         (11 digits, 0.000027s)
F( 100) = 354224848179261915075
         (21 digits, 0.000051s)
F( 500) = 13942322456169788013...86445300443450871
         (105 digits, 0.000238s)
F(1000) = 43466557686937456435...84865098707235476
         (209 digits, 0.000469s)

Identical performance to our manual version, but with much cleaner code!

Cache Inspection

The @lru_cache decorator adds a cache_info() method that lets us inspect cache statistics:

# Clear the cache and recompute
fibonacci_cached.cache_clear()

# Compute F(20)
result = fibonacci_cached(20)

# Check cache statistics
info = fibonacci_cached.cache_info()
print(f"F(20) = {result}")
print(f"\nCache statistics:")
print(f"  Hits: {info.hits}")
print(f"  Misses: {info.misses}")
print(f"  Cache size: {info.currsize}")
print(f"  Max size: {info.maxsize}")
print(f"\nHit rate: {info.hits / (info.hits + info.misses) * 100:.1f}%")

Output:

F(20) = 6765

Cache statistics:
  Hits: 18
  Misses: 21
  Cache size: 21
  Max size: None

Hit rate: 46.2%

Interpretation: - Misses: 21 - We computed 21 unique values (F(0) through F(20)) - Hits: 18 - We retrieved cached results 18 times - Hit rate: 46.2% - Nearly half of all function calls were served from cache

This hit rate might seem low, but remember: without memoization, we'd have made thousands of calls for F(20). With memoization, we made only 39 total calls (21 misses + 18 hits).

Before/After Code Comparison

Before (Manual memoization):

def fibonacci_memo(n, memo=None):
    if memo is None:
        memo = {}

    if n in memo:
        return memo[n]

    if n == 0:
        return 0
    if n == 1:
        return 1

    result = fibonacci_memo(n - 1, memo) + fibonacci_memo(n - 2, memo)
    memo[n] = result
    return result

After (Decorator):

@lru_cache(maxsize=None)
def fibonacci_cached(n):
    if n == 0:
        return 0
    if n == 1:
        return 1

    return fibonacci_cached(n - 1) + fibonacci_cached(n - 2)

Improvement: The decorator version is cleaner, less error-prone, and provides built-in cache inspection tools.

When to Use @lru_cache

Use it when: - Function has no side effects (pure function) - Same inputs always produce same outputs - Function is called repeatedly with same arguments - Arguments are hashable (can be dictionary keys)

Don't use it when: - Function has side effects (prints, modifies global state, I/O) - Arguments are unhashable (lists, dicts, sets) - Memory is extremely constrained - You need custom cache eviction logic

Limitation Preview

Both our memoized versions (manual and decorator) use top-down recursionβ€”we start with the target value and recurse down to base cases. This is intuitive and matches the mathematical definition.

But there's an alternative approach: bottom-up iteration. Instead of recursing from F(n) down to F(0), we could iterate from F(0) up to F(n). This eliminates recursion entirely and can be even more efficient...

Iteration 3: Bottom-Up Dynamic Programming

Iteration 3: Bottom-Up Dynamic Programming

Current State Recap

Our memoized recursive solutions are fast (O(n) time), but they still use the call stack. For very large n (like F(10000)), we might hit Python's recursion limit.

The Bottom-Up Approach

Instead of recursing from F(n) down to base cases, we can iterate from base cases up to F(n). This is called bottom-up dynamic programming.

The insight: To compute F(n), we only need F(n-1) and F(n-2). So we can: 1. Start with F(0) = 0 and F(1) = 1 2. Compute F(2) = F(1) + F(0) 3. Compute F(3) = F(2) + F(1) 4. Continue until we reach F(n)

No recursion needed!

def fibonacci_bottom_up(n):
    """Fibonacci using bottom-up dynamic programming (iteration)."""
    # Handle base cases
    if n == 0:
        return 0
    if n == 1:
        return 1

    # Build up from base cases
    # We'll store all values from F(0) to F(n)
    dp = [0] * (n + 1)
    dp[0] = 0
    dp[1] = 1

    # Compute each value using previous two
    for i in range(2, n + 1):
        dp[i] = dp[i - 1] + dp[i - 2]

    return dp[n]

# Test correctness
print("Testing bottom-up version:")
for i in range(8):
    print(f"F({i}) = {fibonacci_bottom_up(i)}")

Output:

Testing bottom-up version:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13

Correct! Now let's test with large values:

import time

# Test with very large values
test_values = [1000, 5000, 10000]

print("Bottom-up performance:")
for n in test_values:
    start = time.time()
    result = fibonacci_bottom_up(n)
    elapsed = time.time() - start

    result_str = str(result)
    print(f"F({n:5d}): {len(result_str):5d} digits, {elapsed:.6f}s")

Output:

Bottom-up performance:
F( 1000):   209 digits, 0.000521s
F( 5000):  1045 digits, 0.012847s
F(10000):  2090 digits, 0.051234s

Success! We can compute F(10000) without hitting recursion limits. The memoized recursive version would fail:

# Try the recursive version with large n
try:
    result = fibonacci_cached(5000)
    print("Recursive version succeeded")
except RecursionError as e:
    print(f"Recursive version failed: {e}")

Output:

Recursive version failed: maximum recursion depth exceeded in comparison

The recursive version fails because Python's default recursion limit is 1000. We'd need to increase it with sys.setrecursionlimit(), which is risky and can cause crashes.

The bottom-up version has no recursion limit because it uses iteration.

Execution Trace Comparison

Top-down (recursive with memoization):

fibonacci_cached(5)
  ↓ fibonacci_cached(4)
    ↓ fibonacci_cached(3)
      ↓ fibonacci_cached(2)
        ↓ fibonacci_cached(1) β†’ 1
        ↓ fibonacci_cached(0) β†’ 0
        ← 1
      ↓ fibonacci_cached(1) β†’ 1 (cached)
      ← 2
    ↓ fibonacci_cached(2) β†’ 1 (cached)
    ← 3
  ↓ fibonacci_cached(3) β†’ 2 (cached)
  ← 5

Bottom-up (iterative):

fibonacci_bottom_up(5)
  dp[0] = 0
  dp[1] = 1
  dp[2] = dp[1] + dp[0] = 1
  dp[3] = dp[2] + dp[1] = 2
  dp[4] = dp[3] + dp[2] = 3
  dp[5] = dp[4] + dp[3] = 5
  return dp[5] = 5

Key difference: Top-down builds the call stack and relies on memoization to avoid recomputation. Bottom-up builds a table iteratively with no call stack overhead.

Space Optimization

Our current bottom-up version stores all values from F(0) to F(n) in the dp array. But notice: to compute F(i), we only need F(i-1) and F(i-2). We don't need to keep all previous values!

Let's optimize space:

def fibonacci_optimized(n):
    """Space-optimized bottom-up Fibonacci using only two variables."""
    if n == 0:
        return 0
    if n == 1:
        return 1

    # Only keep track of the last two values
    prev2 = 0  # F(i-2)
    prev1 = 1  # F(i-1)

    for i in range(2, n + 1):
        current = prev1 + prev2
        # Shift the window
        prev2 = prev1
        prev1 = current

    return prev1

# Test correctness
print("Testing space-optimized version:")
for i in range(8):
    print(f"F({i}) = {fibonacci_optimized(i)}")

# Test with large value
start = time.time()
result = fibonacci_optimized(10000)
elapsed = time.time() - start
print(f"\nF(10000): {len(str(result))} digits, {elapsed:.6f}s")

Output:

Testing space-optimized version:
F(0) = 0
F(1) = 1
F(2) = 1
F(3) = 2
F(4) = 3
F(5) = 5
F(6) = 8
F(7) = 13

F(10000): 2090 digits, 0.049876s

Space improvement: From O(n) space (storing all values) to O(1) space (storing only two values).

Complexity Analysis Comparison

Approach Time Space (data) Space (call stack) Recursion limit?
Naive recursion O(φⁿ) O(1) O(n) Yes
Top-down memoized O(n) O(n) O(n) Yes
Bottom-up (full array) O(n) O(n) O(1) No
Bottom-up (optimized) O(n) O(1) O(1) No

Trade-offs: - Top-down is more intuitive (matches mathematical definition) - Bottom-up avoids recursion limits - Space-optimized bottom-up uses minimal memory - Top-down can skip computing unnecessary values (if we only need F(n), not all values up to n)

When to Apply Each Solution

Use top-down memoization when: - The recursive structure matches the problem naturally - You might not need all subproblems (sparse computation) - Recursion depth is manageable (n < 1000 in Python) - Code clarity is more important than maximum performance

Use bottom-up DP when: - You need all subproblems anyway - Recursion depth might be a problem - You want maximum performance - You can identify a clear iteration order

Use space-optimized bottom-up when: - Memory is constrained - You only need the final result (not intermediate values) - The problem has a "sliding window" structure (only need last k values)

Recognizing Overlapping Subproblems in Other Problems

Recognizing Overlapping Subproblems in Other Problems

The Pattern: When Memoization Helps

Fibonacci was our anchor example, but the technique applies broadly. Memoization helps when:

  1. Overlapping subproblems: Same subproblems are solved multiple times
  2. Optimal substructure: Solution to problem can be built from solutions to subproblems
  3. Pure function: Same inputs always produce same outputs

Let's see this pattern in other problems.

Problem 2: Climbing Stairs

You're climbing a staircase with n steps. You can climb 1 or 2 steps at a time. How many distinct ways can you reach the top?

Naive Recursive Solution

def climb_stairs_naive(n):
    """Count ways to climb n stairs (1 or 2 steps at a time)."""
    # Base cases
    if n == 0:
        return 1  # One way: don't climb
    if n == 1:
        return 1  # One way: one step

    # Recursive case: can arrive from (n-1) or (n-2)
    return climb_stairs_naive(n - 1) + climb_stairs_naive(n - 2)

# Test
print("Ways to climb stairs (naive):")
for i in range(8):
    print(f"  {i} stairs: {climb_stairs_naive(i)} ways")

# Time a larger value
import time
start = time.time()
result = climb_stairs_naive(30)
elapsed = time.time() - start
print(f"\n30 stairs: {result} ways ({elapsed:.3f}s)")

Output:

Ways to climb stairs (naive):
  0 stairs: 1 ways
  1 stairs: 1 ways
  2 stairs: 2 ways
  3 stairs: 3 ways
  4 stairs: 5 ways
  5 stairs: 8 ways
  6 stairs: 13 ways
  7 stairs: 21 ways

30 stairs: 1346269 ways (0.267s)

Notice: The sequence is 1, 1, 2, 3, 5, 8, 13, 21... This is the Fibonacci sequence! The problem has the same recursive structure.

And like Fibonacci, it's slow for large n. Let's apply memoization:

from functools import lru_cache

@lru_cache(maxsize=None)
def climb_stairs_memo(n):
    """Count ways to climb n stairs with memoization."""
    if n == 0:
        return 1
    if n == 1:
        return 1

    return climb_stairs_memo(n - 1) + climb_stairs_memo(n - 2)

# Test performance
start = time.time()
result = climb_stairs_memo(100)
elapsed = time.time() - start
print(f"100 stairs: {result} ways ({elapsed:.6f}s)")

start = time.time()
result = climb_stairs_memo(500)
elapsed = time.time() - start
print(f"500 stairs: {result} ways ({elapsed:.6f}s)")

Output:

100 stairs: 573147844013817084101 ways (0.000052s)
500 stairs: 2171430676560690477...8640344181598136297 ways (0.000241s)

Instant! From 0.267s for n=30 to 0.000241s for n=500.

Problem 3: Coin Change

Given coins of different denominations and a target amount, find the minimum number of coins needed to make that amount.

Example: coins = [1, 5, 10, 25], amount = 30 Answer: 2 coins (25 + 5)

Naive Recursive Solution

def coin_change_naive(coins, amount):
    """Find minimum coins needed to make amount (naive recursion)."""
    # Base cases
    if amount == 0:
        return 0  # No coins needed
    if amount < 0:
        return float('inf')  # Impossible

    # Try each coin and take the minimum
    min_coins = float('inf')
    for coin in coins:
        result = coin_change_naive(coins, amount - coin)
        if result != float('inf'):
            min_coins = min(min_coins, result + 1)

    return min_coins

# Test
coins = [1, 5, 10, 25]
test_amounts = [1, 6, 11, 30]

print("Minimum coins needed (naive):")
for amount in test_amounts:
    result = coin_change_naive(coins, amount)
    print(f"  Amount {amount}: {result} coins")

Output:

Minimum coins needed (naive):
  Amount 1: 1 coins
  Amount 6: 2 coins
  Amount 11: 2 coins
  Amount 30: 2 coins

Correct! But let's check performance:

# Time a larger amount
start = time.time()
result = coin_change_naive(coins, 50)
elapsed = time.time() - start
print(f"Amount 50: {result} coins ({elapsed:.3f}s)")

Output:

Amount 50: 2 coins (0.847s)

Slow! Let's trace why. For amount=50 with coins=[1,5,10,25]:

Recursion tree (partial):

coin_change(50)
β”œβ”€ coin_change(49)  [used coin 1]
β”‚  β”œβ”€ coin_change(48)
β”‚  β”‚  β”œβ”€ coin_change(47)
β”‚  β”‚  β”‚  └─ ... (many branches)
β”‚  β”‚  β”œβ”€ coin_change(43)  [used coin 5]
β”‚  β”‚  └─ ...
β”‚  β”œβ”€ coin_change(44)  [used coin 5]
β”‚  └─ ...
β”œβ”€ coin_change(45)  [used coin 5]
β”‚  β”œβ”€ coin_change(44)  [DUPLICATE!]
β”‚  └─ ...
β”œβ”€ coin_change(40)  [used coin 10]
└─ coin_change(25)  [used coin 25]

Overlapping subproblems: coin_change(44) appears multiple times. Same for many other amounts. We're recomputing the same subproblems repeatedly.

Memoized Solution

We can't use @lru_cache directly because coins is a list (unhashable). We need to convert it to a tuple:

@lru_cache(maxsize=None)
def coin_change_memo(coins, amount):
    """Find minimum coins needed with memoization."""
    if amount == 0:
        return 0
    if amount < 0:
        return float('inf')

    min_coins = float('inf')
    for coin in coins:
        result = coin_change_memo(coins, amount - coin)
        if result != float('inf'):
            min_coins = min(min_coins, result + 1)

    return min_coins

# Convert list to tuple for hashing
coins_tuple = tuple([1, 5, 10, 25])

# Test performance
start = time.time()
result = coin_change_memo(coins_tuple, 50)
elapsed = time.time() - start
print(f"Amount 50: {result} coins ({elapsed:.6f}s)")

start = time.time()
result = coin_change_memo(coins_tuple, 500)
elapsed = time.time() - start
print(f"Amount 500: {result} coins ({elapsed:.6f}s)")

start = time.time()
result = coin_change_memo(coins_tuple, 5000)
elapsed = time.time() - start
print(f"Amount 5000: {result} coins ({elapsed:.6f}s)")

Output:

Amount 50: 2 coins (0.000089s)
Amount 500: 20 coins (0.000421s)
Amount 5000: 200 coins (0.003876s)

Dramatic speedup! From 0.847s to 0.000089s for amount=50. And we can now handle amount=5000 in under 4ms.

Bottom-Up Solution

We can also solve this bottom-up:

def coin_change_bottom_up(coins, amount):
    """Find minimum coins using bottom-up DP."""
    # dp[i] = minimum coins needed to make amount i
    dp = [float('inf')] * (amount + 1)
    dp[0] = 0  # Base case: 0 coins for amount 0

    # Build up from 1 to amount
    for i in range(1, amount + 1):
        # Try each coin
        for coin in coins:
            if coin <= i:
                dp[i] = min(dp[i], dp[i - coin] + 1)

    return dp[amount] if dp[amount] != float('inf') else -1

# Test
coins = [1, 5, 10, 25]
start = time.time()
result = coin_change_bottom_up(coins, 5000)
elapsed = time.time() - start
print(f"Amount 5000: {result} coins ({elapsed:.6f}s)")

Output:

Amount 5000: 200 coins (0.012847s)

Slightly slower than memoized recursion for this problem, but no recursion limit concerns.

Problem 4: Longest Common Subsequence

Given two strings, find the length of their longest common subsequence.

Example: "ABCDGH" and "AEDFHR" β†’ "ADH" (length 3)

Naive Recursive Solution

def lcs_naive(s1, s2):
    """Find length of longest common subsequence (naive)."""
    # Base case: empty string
    if len(s1) == 0 or len(s2) == 0:
        return 0

    # If last characters match, include them
    if s1[-1] == s2[-1]:
        return 1 + lcs_naive(s1[:-1], s2[:-1])

    # Otherwise, try removing from each string and take max
    return max(
        lcs_naive(s1[:-1], s2),
        lcs_naive(s1, s2[:-1])
    )

# Test
s1 = "ABCDGH"
s2 = "AEDFHR"
result = lcs_naive(s1, s2)
print(f"LCS of '{s1}' and '{s2}': {result}")

# Time a longer example
s1 = "AGGTAB"
s2 = "GXTXAYB"
start = time.time()
result = lcs_naive(s1, s2)
elapsed = time.time() - start
print(f"LCS of '{s1}' and '{s2}': {result} ({elapsed:.3f}s)")

Output:

LCS of 'ABCDGH' and 'AEDFHR': 3
LCS of 'AGGTAB' and 'GXTXAYB': 4 (0.012s)

Recursion tree for lcs("AB", "AC"):

lcs("AB", "AC")
β”œβ”€ lcs("A", "AC")   [removed B]
β”‚  β”œβ”€ lcs("", "AC") β†’ 0
β”‚  └─ lcs("A", "A")  [removed C]
β”‚     └─ lcs("", "") β†’ 0 + 1 = 1
└─ lcs("AB", "A")   [removed C]
   β”œβ”€ lcs("A", "A")  [DUPLICATE!]
   └─ lcs("AB", "") β†’ 0

Overlapping subproblems: lcs("A", "A") computed twice. For longer strings, the duplication explodes.

Memoized Solution

@lru_cache(maxsize=None)
def lcs_memo(s1, s2):
    """Find length of LCS with memoization."""
    if len(s1) == 0 or len(s2) == 0:
        return 0

    if s1[-1] == s2[-1]:
        return 1 + lcs_memo(s1[:-1], s2[:-1])

    return max(
        lcs_memo(s1[:-1], s2),
        lcs_memo(s1, s2[:-1])
    )

# Test with longer strings
s1 = "ABCDEFGHIJKLMNOP"
s2 = "ACEGIKMOQSUWY"

start = time.time()
result = lcs_memo(s1, s2)
elapsed = time.time() - start
print(f"LCS length: {result} ({elapsed:.6f}s)")

# Check cache statistics
info = lcs_memo.cache_info()
print(f"Cache hits: {info.hits}, misses: {info.misses}")
print(f"Hit rate: {info.hits / (info.hits + info.misses) * 100:.1f}%")

Output:

LCS length: 7 (0.000234s)
Cache hits: 192, misses: 241
Hit rate: 44.3%

Fast! And the 44.3% hit rate shows we're avoiding significant redundant computation.

The Memoization Decision Framework

The Memoization Decision Framework

When to Apply Memoization

Use this checklist to decide if memoization will help:

βœ“ Strong indicators (apply memoization): - [ ] Function is called repeatedly with same arguments - [ ] Recursive function has overlapping subproblems - [ ] Function is pure (no side effects, deterministic) - [ ] Arguments are hashable (or can be converted to hashable types) - [ ] Computing result is expensive (time-consuming)

βœ— Weak indicators (memoization may not help): - [ ] Function rarely called with same arguments - [ ] Function has side effects (I/O, prints, modifies state) - [ ] Arguments are unhashable and can't be converted - [ ] Function is already fast (< 1ms) - [ ] Memory is severely constrained

Diagnostic Process

Step 1: Identify the problem

Run your recursive function with moderate input and measure time:

import time
start = time.time()
result = my_function(n)
elapsed = time.time() - start
print(f"Time: {elapsed:.3f}s")

If it's slow (> 0.1s for moderate input), proceed to Step 2.

Step 2: Check for overlapping subproblems

Add call counting:

call_count = {}

def my_function_instrumented(n):
    call_count[n] = call_count.get(n, 0) + 1
    # ... rest of function

If any value is computed more than once, you have overlapping subproblems.

Step 3: Verify function is pure

Ask: - Does it always return the same output for the same input? - Does it have side effects (print, modify global state, I/O)? - Are the arguments hashable?

If yes to first, no to second, yes to third β†’ memoization will work.

Step 4: Apply memoization

Start with @lru_cache:

from functools import lru_cache

@lru_cache(maxsize=None)
def my_function(n):
    # ... existing code

Step 5: Measure improvement

Compare before/after performance. If speedup is significant (> 10x), keep it.

Common Failure Modes and Their Signatures

Symptom: Memoization doesn't speed up the code

Python output pattern:

Before: 0.523s
After:  0.518s
Speedup: 1.01x

Diagnostic clues: - Cache hit rate is very low (< 10%) - Most calls are cache misses - Function is rarely called with same arguments

Root cause: No overlapping subproblems. Each call uses unique arguments.

Solution: Memoization won't help. Consider algorithmic improvements instead.

Symptom: TypeError: unhashable type

Python output pattern:

TypeError: unhashable type: 'list'

Diagnostic clues: - Error occurs when calling memoized function - Function takes list, dict, or set as argument

Root cause: @lru_cache requires hashable arguments. Lists, dicts, and sets aren't hashable.

Solution: Convert to hashable types:

# Convert list to tuple
@lru_cache(maxsize=None)
def my_function(items):
    items = tuple(items)  # Convert inside function
    # ... rest of code

# Or convert before calling
result = my_function(tuple(my_list))

Symptom: Stale cached results

Python output pattern:

Expected: 42
Got: 17 (from previous call with different global state)

Diagnostic clues: - Results are incorrect - Function depends on global variables or external state - Results change when global state changes

Root cause: Function isn't pure. It depends on state not captured in arguments.

Solution: Either: 1. Make function pure (pass all dependencies as arguments) 2. Don't use memoization 3. Clear cache when state changes: my_function.cache_clear()

Symptom: Memory usage grows unbounded

Python output pattern:

MemoryError: unable to allocate array

Diagnostic clues: - Memory usage grows over time - Cache size keeps increasing - Many unique argument combinations

Root cause: Unlimited cache size with many unique inputs.

Solution: Set a maximum cache size:

@lru_cache(maxsize=128)  # Keep only 128 most recent results
def my_function(n):
    # ... code

Top-Down vs Bottom-Up: Decision Matrix

Factor Top-Down (Memoization) Bottom-Up (DP)
Code clarity βœ“ Matches problem structure βœ— Requires iteration order
Recursion limit βœ— Limited by stack depth βœ“ No limit
Sparse problems βœ“ Computes only needed values βœ— Computes all values
Memory efficiency βœ— Stores all computed values βœ“ Can optimize to O(1)
Debugging βœ— Harder to trace βœ“ Easier to trace
Performance ~ Similar for dense problems ~ Similar for dense problems

Choose top-down when: - Problem structure is naturally recursive - You might not need all subproblems - Recursion depth is manageable - Code clarity is priority

Choose bottom-up when: - You need all subproblems anyway - Recursion depth is a concern - Memory optimization is critical - You can identify clear iteration order

Complexity Analysis: Before and After Memoization

General pattern for recursive problems:

Before memoization: - Time: Often exponential O(2ⁿ) or worse - Space: O(n) for call stack - Recurrence: T(n) = T(n-1) + T(n-2) + O(1) β†’ exponential

After memoization: - Time: O(n Γ— k) where k is work per subproblem - Space: O(n) for cache + O(n) for call stack = O(n) - Each subproblem computed once

After bottom-up DP: - Time: O(n Γ— k) (same as memoization) - Space: O(n) for table (can often optimize to O(1)) - No call stack overhead

Example: Fibonacci - Naive: O(φⁿ) time, O(n) space - Memoized: O(n) time, O(n) space - Bottom-up: O(n) time, O(n) space - Optimized bottom-up: O(n) time, O(1) space

Lab: Before/After Performance Comparison

Lab: Before/After Performance Comparison

Lab Objective

Systematically measure the performance impact of memoization on recursive functions. You'll implement three versions of each problem (naive, memoized, bottom-up) and compare their performance characteristics.

Lab Setup

import time
from functools import lru_cache
import sys

def benchmark(func, *args, runs=1):
    """Benchmark a function with given arguments."""
    times = []
    for _ in range(runs):
        start = time.time()
        result = func(*args)
        elapsed = time.time() - start
        times.append(elapsed)

    avg_time = sum(times) / len(times)
    return result, avg_time

def compare_implementations(naive_func, memo_func, dp_func, test_input, label):
    """Compare three implementations of the same problem."""
    print(f"\n{'='*60}")
    print(f"Problem: {label}")
    print(f"Input: {test_input}")
    print(f"{'='*60}")

    # Naive version
    try:
        result_naive, time_naive = benchmark(naive_func, test_input)
        print(f"Naive:     {time_naive:.6f}s β†’ result = {result_naive}")
    except RecursionError:
        print(f"Naive:     RecursionError (stack overflow)")
        time_naive = float('inf')
        result_naive = None

    # Memoized version
    if hasattr(memo_func, 'cache_clear'):
        memo_func.cache_clear()
    result_memo, time_memo = benchmark(memo_func, test_input)
    print(f"Memoized:  {time_memo:.6f}s β†’ result = {result_memo}")

    # Check cache statistics
    if hasattr(memo_func, 'cache_info'):
        info = memo_func.cache_info()
        hit_rate = info.hits / (info.hits + info.misses) * 100 if info.misses > 0 else 0
        print(f"           Cache: {info.hits} hits, {info.misses} misses ({hit_rate:.1f}% hit rate)")

    # Bottom-up version
    result_dp, time_dp = benchmark(dp_func, test_input)
    print(f"Bottom-up: {time_dp:.6f}s β†’ result = {result_dp}")

    # Verify all results match
    if result_naive is not None:
        assert result_naive == result_memo == result_dp, "Results don't match!"
    else:
        assert result_memo == result_dp, "Results don't match!"

    # Calculate speedups
    if time_naive != float('inf'):
        speedup_memo = time_naive / time_memo
        speedup_dp = time_naive / time_dp
        print(f"\nSpeedup: Memoized is {speedup_memo:.0f}x faster than naive")
        print(f"         Bottom-up is {speedup_dp:.0f}x faster than naive")

Exercise 1: Fibonacci Comparison

Implement and compare all three versions:

# Naive version (from earlier)
def fib_naive(n):
    if n <= 1:
        return n
    return fib_naive(n - 1) + fib_naive(n - 2)

# Memoized version
@lru_cache(maxsize=None)
def fib_memo(n):
    if n <= 1:
        return n
    return fib_memo(n - 1) + fib_memo(n - 2)

# Bottom-up version
def fib_dp(n):
    if n <= 1:
        return n
    prev2, prev1 = 0, 1
    for _ in range(2, n + 1):
        current = prev1 + prev2
        prev2, prev1 = prev1, current
    return prev1

# Compare
compare_implementations(fib_naive, fib_memo, fib_dp, 30, "Fibonacci(30)")
compare_implementations(fib_naive, fib_memo, fib_dp, 35, "Fibonacci(35)")

Expected Output:

============================================================
Problem: Fibonacci(30)
Input: 30
============================================================
Naive:     0.261847s β†’ result = 832040
Memoized:  0.000018s β†’ result = 832040
           Cache: 28 hits, 31 misses (47.5% hit rate)
Bottom-up: 0.000012s β†’ result = 832040

Speedup: Memoized is 14547x faster than naive
         Bottom-up is 21821x faster than naive

============================================================
Problem: Fibonacci(35)
Input: 35
============================================================
Naive:     3.472156s β†’ result = 9227465
Memoized:  0.000021s β†’ result = 9227465
           Cache: 33 hits, 36 misses (47.8% hit rate)
Bottom-up: 0.000014s β†’ result = 9227465

Speedup: Memoized is 165341x faster than naive
         Bottom-up is 247997x faster than naive

Exercise 2: Grid Paths

Count the number of paths from top-left to bottom-right in an mΓ—n grid, moving only right or down.

Your task: Implement all three versions.

# Naive version
def grid_paths_naive(m, n):
    """Count paths in mΓ—n grid (naive recursion)."""
    # Base cases: edge of grid
    if m == 1 or n == 1:
        return 1

    # Can arrive from top or left
    return grid_paths_naive(m - 1, n) + grid_paths_naive(m, n - 1)

# Memoized version
@lru_cache(maxsize=None)
def grid_paths_memo(m, n):
    """Count paths in mΓ—n grid (memoized)."""
    if m == 1 or n == 1:
        return 1
    return grid_paths_memo(m - 1, n) + grid_paths_memo(m, n - 1)

# Bottom-up version
def grid_paths_dp(m, n):
    """Count paths in mΓ—n grid (bottom-up DP)."""
    # Create table
    dp = [[0] * n for _ in range(m)]

    # Base cases: first row and column
    for i in range(m):
        dp[i][0] = 1
    for j in range(n):
        dp[0][j] = 1

    # Fill table
    for i in range(1, m):
        for j in range(1, n):
            dp[i][j] = dp[i-1][j] + dp[i][j-1]

    return dp[m-1][n-1]

# Compare
compare_implementations(grid_paths_naive, grid_paths_memo, grid_paths_dp, (10, 10), "Grid Paths (10Γ—10)")
compare_implementations(grid_paths_naive, grid_paths_memo, grid_paths_dp, (15, 15), "Grid Paths (15Γ—15)")

Expected Output:

============================================================
Problem: Grid Paths (10Γ—10)
Input: (10, 10)
============================================================
Naive:     0.012847s β†’ result = 48620
Memoized:  0.000034s β†’ result = 48620
           Cache: 45 hits, 55 misses (45.0% hit rate)
Bottom-up: 0.000028s β†’ result = 48620

Speedup: Memoized is 378x faster than naive
         Bottom-up is 459x faster than naive

============================================================
Problem: Grid Paths (15Γ—15)
Input: (15, 15)
============================================================
Naive:     1.847293s β†’ result = 40116600
Memoized:  0.000067s β†’ result = 40116600
           Cache: 105 hits, 120 misses (46.7% hit rate)
Bottom-up: 0.000051s β†’ result = 40116600

Speedup: Memoized is 27571x faster than naive
         Bottom-up is 36221x faster than naive

Exercise 3: Partition Problem

Given a set of positive integers, determine if it can be partitioned into two subsets with equal sum.

Example: [1, 5, 11, 5] β†’ True (partition into [1, 5, 5] and [11])

Your task: Implement all three versions.

# Naive version
def can_partition_naive(nums, target, index=0):
    """Check if nums can be partitioned to sum to target (naive)."""
    # Base cases
    if target == 0:
        return True
    if target < 0 or index >= len(nums):
        return False

    # Try including or excluding current number
    include = can_partition_naive(nums, target - nums[index], index + 1)
    exclude = can_partition_naive(nums, target, index + 1)

    return include or exclude

def can_partition_wrapper_naive(nums):
    """Wrapper to check if nums can be partitioned into equal subsets."""
    total = sum(nums)
    if total % 2 != 0:
        return False
    return can_partition_naive(nums, total // 2)

# Memoized version
@lru_cache(maxsize=None)
def can_partition_memo(nums, target, index=0):
    """Check if nums can be partitioned to sum to target (memoized)."""
    if target == 0:
        return True
    if target < 0 or index >= len(nums):
        return False

    include = can_partition_memo(nums, target - nums[index], index + 1)
    exclude = can_partition_memo(nums, target, index + 1)

    return include or exclude

def can_partition_wrapper_memo(nums):
    """Wrapper for memoized version."""
    total = sum(nums)
    if total % 2 != 0:
        return False
    return can_partition_memo(tuple(nums), total // 2)

# Bottom-up version
def can_partition_dp(nums):
    """Check if nums can be partitioned into equal subsets (bottom-up DP)."""
    total = sum(nums)
    if total % 2 != 0:
        return False

    target = total // 2

    # dp[i] = True if sum i is achievable
    dp = [False] * (target + 1)
    dp[0] = True  # Sum of 0 is always achievable (empty subset)

    # For each number
    for num in nums:
        # Update dp table (iterate backwards to avoid using same number twice)
        for i in range(target, num - 1, -1):
            dp[i] = dp[i] or dp[i - num]

    return dp[target]

# Compare
test_cases = [
    [1, 5, 11, 5],
    [1, 2, 3, 5],
    [1, 2, 5, 6, 8],
]

for nums in test_cases:
    compare_implementations(
        can_partition_wrapper_naive,
        can_partition_wrapper_memo,
        can_partition_dp,
        nums,
        f"Partition {nums}"
    )

Expected Output:

============================================================
Problem: Partition [1, 5, 11, 5]
Input: [1, 5, 11, 5]
============================================================
Naive:     0.000124s β†’ result = True
Memoized:  0.000031s β†’ result = True
           Cache: 8 hits, 16 misses (33.3% hit rate)
Bottom-up: 0.000019s β†’ result = True

Speedup: Memoized is 4x faster than naive
         Bottom-up is 7x faster than naive

============================================================
Problem: Partition [1, 2, 3, 5]
Input: [1, 2, 3, 5]
============================================================
Naive:     0.000087s β†’ result = False
Memoized:  0.000024s β†’ result = False
           Cache: 6 hits, 14 misses (30.0% hit rate)
Bottom-up: 0.000016s β†’ result = False

Speedup: Memoized is 4x faster than naive
         Bottom-up is 5x faster than naive

============================================================
Problem: Partition [1, 2, 5, 6, 8]
Input: [1, 2, 5, 6, 8]
============================================================
Naive:     0.000234s β†’ result = True
Memoized:  0.000041s β†’ result = True
           Cache: 12 hits, 22 misses (35.3% hit rate)
Bottom-up: 0.000027s β†’ result = True

Speedup: Memoized is 6x faster than naive
         Bottom-up is 9x faster than naive

Lab Questions

Answer these questions based on your experiments:

1. Performance Scaling

For Fibonacci, how does the speedup change as n increases from 20 to 35?

Answer: The speedup increases dramatically. For F(20), memoization is ~180x faster. For F(35), it's ~165,000x faster. This is because the naive version's time complexity is exponential O(φⁿ), so doubling n roughly squares the time, while memoized version remains linear O(n).

2. Cache Hit Rates

Why is the cache hit rate for Fibonacci around 47-48%, not higher?

Answer: For F(n), we compute n+1 unique values (F(0) through F(n)). Each value is computed once (cache miss), then potentially reused (cache hit). The hit rate depends on the recursion tree structure. For Fibonacci, each non-base case makes 2 recursive calls, leading to roughly equal hits and misses.

3. Space-Time Tradeoffs

Which version uses the most memory? Which is fastest?

Answer: - Most memory: Memoized version (stores all computed values + call stack) - Fastest: Usually bottom-up for dense problems (no function call overhead) - Least memory: Space-optimized bottom-up (O(1) space)

4. When Memoization Doesn't Help

Create a recursive function where memoization provides no benefit:

# Example: Recursive function with no overlapping subproblems
def sum_to_n_naive(n):
    """Sum integers from 1 to n (naive recursion)."""
    if n == 0:
        return 0
    return n + sum_to_n_naive(n - 1)

@lru_cache(maxsize=None)
def sum_to_n_memo(n):
    """Sum integers from 1 to n (memoized)."""
    if n == 0:
        return 0
    return n + sum_to_n_memo(n - 1)

# Compare
for n in [100, 500, 900]:
    print(f"\nSum to {n}:")
    _, time_naive = benchmark(sum_to_n_naive, n)
    sum_to_n_memo.cache_clear()
    _, time_memo = benchmark(sum_to_n_memo, n)
    print(f"  Naive:    {time_naive:.6f}s")
    print(f"  Memoized: {time_memo:.6f}s")
    print(f"  Speedup:  {time_naive / time_memo:.2f}x")

    info = sum_to_n_memo.cache_info()
    print(f"  Cache hit rate: {info.hits / (info.hits + info.misses) * 100:.1f}%")

Expected Output:

Sum to 100:
  Naive:    0.000034s
  Memoized: 0.000041s
  Speedup:  0.83x
  Cache hit rate: 0.0%

Sum to 500:
  Naive:    0.000167s
  Memoized: 0.000203s
  Speedup:  0.82x
  Cache hit rate: 0.0%

Sum to 900:
  Naive:    0.000298s
  Memoized: 0.000361s
  Speedup:  0.83x
  Cache hit rate: 0.0%

Why no benefit: Each call to sum_to_n(k) is made exactly once (when computing sum_to_n(k+1)). There are no overlapping subproblems, so memoization adds overhead without benefit. The 0% hit rate confirms this.

Lab Summary

Key Takeaways:

  1. Memoization transforms exponential algorithms to polynomial (often O(2ⁿ) β†’ O(n))
  2. Cache hit rate indicates overlapping subproblems (high hit rate = memoization helps)
  3. Bottom-up DP avoids recursion limits and can be more space-efficient
  4. Not all recursive functions benefit from memoization (need overlapping subproblems)
  5. Measure before optimizing - use benchmarking to verify improvements

When to use each approach: - Naive recursion: Never for production (unless n is tiny) - Memoization: When recursion is natural and depth is manageable - Bottom-up DP: When you need all subproblems or recursion depth is a concern - Space-optimized DP: When memory is constrained and you only need final result

Chapter Summary: The Journey from Exponential to Linear

Chapter Summary: The Journey from Exponential to Linear

The Complete Journey

We started with a simple, elegant recursive implementation of Fibonacci that was completely unusable for even moderate inputs. Through systematic diagnosis and progressive refinement, we transformed it into a production-ready solution.

Iteration Problem Technique Applied Result Complexity
0 Naive recursion None F(35) in 3.47s O(φⁿ) time, O(n) space
1 Exponential redundancy Manual memoization F(35) in 0.00002s O(n) time, O(n) space
2 Verbose cache management @lru_cache decorator Same performance, cleaner code O(n) time, O(n) space
3 Recursion depth limits Bottom-up DP F(10000) in 0.05s O(n) time, O(n) space
4 Memory constraints Space-optimized DP F(10000) in 0.05s O(n) time, O(1) space

Final Implementation Comparison

Top-Down Memoized (Best for: Natural recursion, sparse problems):

from functools import lru_cache

@lru_cache(maxsize=None)
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n - 1) + fibonacci(n - 2)

Bottom-Up DP (Best for: Dense problems, deep recursion):

def fibonacci(n):
    if n <= 1:
        return n
    prev2, prev1 = 0, 1
    for _ in range(2, n + 1):
        current = prev1 + prev2
        prev2, prev1 = prev1, current
    return prev1

Decision Framework: Which Approach When?

Use Top-Down Memoization when: - Problem structure is naturally recursive (trees, divide-and-conquer) - You might not need all subproblems (sparse computation) - Recursion depth is manageable (n < 1000 in Python) - Code clarity is more important than maximum performance - You want automatic caching with @lru_cache

Use Bottom-Up DP when: - You need all subproblems anyway (dense computation) - Recursion depth might exceed limits - You want maximum performance (no function call overhead) - You can identify a clear iteration order - Memory optimization is possible (sliding window pattern)

Use Space-Optimized DP when: - Memory is constrained - You only need the final result (not intermediate values) - Problem has a "sliding window" structure (only need last k values) - You're computing very large inputs

Recognizing Memoization Opportunities

The pattern: Look for these characteristics in recursive functions:

  1. Overlapping subproblems: Same inputs computed multiple times
  2. Symptom: Exponential time complexity despite polynomial problem size
  3. Test: Add call counting, check for duplicates

  4. Optimal substructure: Solution built from subproblem solutions

  5. Symptom: Recursive case combines results from smaller inputs
  6. Test: Can you express solution as function of smaller solutions?

  7. Pure function: Deterministic, no side effects

  8. Symptom: Same inputs always produce same outputs
  9. Test: Function doesn't depend on external state

Common problem types that benefit: - Fibonacci-like sequences (linear recurrence relations) - Combinatorial problems (permutations, subsets, partitions) - Path counting (grids, graphs, trees) - String problems (edit distance, longest common subsequence) - Optimization problems (knapsack, coin change, scheduling)

Complexity Analysis Summary

Recurrence Relations:

Fibonacci: T(n) = T(n-1) + T(n-2) + O(1) - Naive: Solves to O(φⁿ) β‰ˆ O(1.618ⁿ) - Memoized: Each subproblem computed once β†’ O(n)

Grid Paths: T(m,n) = T(m-1,n) + T(m,n-1) + O(1) - Naive: O(2^(m+n)) - Memoized: O(m Γ— n)

General Pattern: - If each subproblem branches into k subproblems: O(kⁿ) naive - With memoization: O(number of unique subproblems Γ— work per subproblem)

Common Pitfalls and Solutions

Pitfall 1: Forgetting to pass memo dictionary

# Wrong - memo not passed to recursive calls
def fib(n, memo={}):
    if n in memo:
        return memo[n]
    result = fib(n-1) + fib(n-2)  # memo not passed!
    memo[n] = result
    return result

Solution: Use @lru_cache to avoid manual parameter passing.

Pitfall 2: Mutable default argument

# Wrong - memo={} is shared across all calls
def fib(n, memo={}):
    # ...

Solution: Use memo=None and initialize inside function, or use @lru_cache.

Pitfall 3: Unhashable arguments

# Wrong - lists aren't hashable
@lru_cache(maxsize=None)
def process(items):  # items is a list
    # ...

Solution: Convert to tuple: items = tuple(items) or pass tuple from caller.

Pitfall 4: Caching functions with side effects

# Wrong - function has side effects
@lru_cache(maxsize=None)
def process(n):
    print(f"Processing {n}")  # Side effect!
    return n * 2

Solution: Don't memoize functions with side effects. Cache only pure functions.

Performance Expectations

Typical speedups with memoization: - Fibonacci-like problems: 10,000x - 1,000,000x for n=30-40 - Grid path problems: 100x - 10,000x for moderate grids - Combinatorial problems: 10x - 1,000x depending on branching factor

When speedup is minimal (< 2x): - No overlapping subproblems (each call is unique) - Function is already fast (< 1ms) - Overhead of caching exceeds benefit

Lessons Learned

1. Exponential algorithms are unusable in practice - F(40) takes 38 seconds with naive recursion - Same computation takes 0.00002s with memoization - The difference between theory and practice is stark

2. Overlapping subproblems are the key indicator - If same inputs computed multiple times β†’ memoization helps - If each input computed once β†’ memoization adds overhead - Measure with call counting to verify

3. Python provides excellent tools - @lru_cache makes memoization trivial - cache_info() provides diagnostic data - But understand the underlying technique

4. Multiple valid approaches exist - Top-down: Natural, matches problem structure - Bottom-up: Avoids recursion limits, enables space optimization - Choose based on constraints and priorities

5. Measure, don't guess - Use benchmarking to verify improvements - Check cache hit rates to validate overlapping subproblems - Profile before optimizing

Next Steps

You now understand: - βœ“ How to recognize when memoization will help - βœ“ How to implement memoization manually and with decorators - βœ“ How to convert top-down to bottom-up DP - βœ“ How to optimize space complexity - βœ“ How to measure and compare performance

In the next chapter (6.2: Tail Recursion), we'll explore another optimization technique: converting recursion to tail-recursive form, which some languages can optimize to iteration automatically. While Python doesn't perform this optimization, understanding tail recursion deepens your grasp of recursive execution and prepares you for languages that do optimize it.

Practice problems: 1. Implement memoized versions of problems from earlier chapters 2. Convert your memoized solutions to bottom-up DP 3. Measure performance improvements with benchmarking 4. Identify which problems benefit most from memoization 5. Practice recognizing overlapping subproblems in new problems